Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove exit module #1384

Merged
merged 1 commit into from
Mar 30, 2016
Merged

Remove exit module #1384

merged 1 commit into from
Mar 30, 2016

Conversation

smfreegard
Copy link
Collaborator

Remove the exit module as it's caused me multiple nasty issues.

The latest issue I've had was extremely hard to diagnose (it required strace) which caused the cluster master to be unable to respawn a crashed child process:

write(2, "events.js:141
      throw er; // Unhandled 'error' event            
Error: This socket is closed.
    at Socket._writeGeneric (net.js:625:19)
    at Socket._write (net.js:678:8)
    at doWrite (_stream_writable.js:291:12)
    at writeOrBuffer (_stream_writable.js:278:5)
    at Socket.Writable.write (_stream_writable.js:206:11)
    at Socket.write (net.js:603:40)
    at /usr/lib/node_modules/Haraka/node_modules/exit/lib/exit.js:25:14
    at Array.forEach (native)
    at exit (/usr/lib/node_modules/Haraka/node_modules/exit/lib/exit.js:20:11)
    at process.<anonymous> (/usr/lib/node_modules/Haraka/haraka.js:41:5)
", 622) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=19697, si_uid=0} ---

It looks like what happened is that exit() somehow manages to close stdin/stdout/stderr in the master process when one of the children hit an unhandled exception and this prevented all future children spawned by master from inheriting them and therefore at start-up when they attempted to write to the console, they immediately hit an EPIPE exception and would then crash out again. Rinse repeat.

@msimerson
Copy link
Member

Reverts #1296

@msimerson
Copy link
Member

Previous bug/misfeature discovered in exit: #1364

@smfreegard smfreegard force-pushed the remove_exit branch 2 times, most recently from 9c6f6b4 to d6c77ac Compare March 10, 2016 23:09
@baudehlo
Copy link
Collaborator

Is there a better fix we should look at?

@smfreegard
Copy link
Collaborator Author

@baudehlo How do you mean? As you pointed out, the bug that exit() supposedly worked around was fixed in Node anyway.

@msimerson
Copy link
Member

Is there a better fix we should look at?

Well, #1296 did address a real issue that also resulted in difficult-to-track-down errors. The fact that it also introduced a couple is a cost of accepting PRs without proper test coverage.

There's two ways to go about this:

  1. Revert to previous known-buggy but less problematic than current version
  2. Add some proper test coverage to specifically address the issues that Feature/improved exit logging #1296 aimed to fix as well as a series of tests that make sure that the new exit behavior is working as intended when processes exit normally and when they crash.

@baudehlo
Copy link
Collaborator

The problem with #2 is I suspect this is platform and node version dependant. I've never seen it (but then I'm still stuck on v0.10 everywhere due to shitty compilers).

On Mar 10, 2016, at 6:35 PM, Matt Simerson [email protected] wrote:

Is there a better fix we should look at?

Well, #1296 did address a real issue that also resulted in difficult-to-track-down errors. The fact that it also introduced a couple is a cost of accepting PRs without proper test coverage.

There's two ways to go about this:

Revert to previous known-buggy but less problematic than current version
Add some proper test coverage to specifically address the issues that #1296 aimed to fix as well as a series of tests that make sure that the new exit behavior is working as intended when a processes exit normally and when they crash.

Reply to this email directly or view it on GitHub.

@smfreegard
Copy link
Collaborator Author

Being as we don't have any integration tests at the moment, exposing either the original issue or the nasty crashes that I've had would is well beyond my skill to do so (e.g. how the heck within an integration test framework do you spool up Haraka, deliberately cause one of the children an exception, then detect that whilst it's still running, the master is endlessly spawning children which die off immediately), likewise how do you write to stdout but detect that due to async issues it wasn't actually written out before the process exited.

As far as I'm concerned we're better off reverting this ASAP as it's currently horribly broken.

I'd rather look at the original issue and generate an EACCESS issue and then see if it's fixable by moving some of the code around in haraka.js or server.js as I seriously don't like the exit() module now, it's a horrible hack.

@baudehlo
Copy link
Collaborator

+1

On Thu, Mar 10, 2016 at 6:57 PM, Steve Freegard [email protected]
wrote:

Being as we don't have any integration tests at the moment, exposing
either the original issue or the nasty crashes that I've had would is well
beyond my skill to do so (e.g. how the heck within an integration test
framework do you spool up Haraka, deliberately cause one of the children an
exception, then detect that whilst it's still running, the master is
endlessly spawning children which die off immediately), likewise how do you
write to stdout but detect that due to async issues it wasn't actually
written out before the process exited.

As far as I'm concerned we're better off reverting this ASAP as it's
currently horribly broken.

I'd rather look at the original issue and generate an EACCESS issue and
then see if it's fixable by moving some of the code around in haraka.js or
server.js as I seriously don't like the exit() module now, it's a horrible
hack.


Reply to this email directly or view it on GitHub
#1384 (comment).

@baudehlo
Copy link
Collaborator

If we decide to revisit this, it seems one option might be: if (I am a cluster child) process.exit() else library_exit()

On Mar 10, 2016, at 7:08 PM, Matt [email protected] wrote:

+1

On Thu, Mar 10, 2016 at 6:57 PM, Steve Freegard [email protected] wrote:
Being as we don't have any integration tests at the moment, exposing either the original issue or the nasty crashes that I've had would is well beyond my skill to do so (e.g. how the heck within an integration test framework do you spool up Haraka, deliberately cause one of the children an exception, then detect that whilst it's still running, the master is endlessly spawning children which die off immediately), likewise how do you write to stdout but detect that due to async issues it wasn't actually written out before the process exited.

As far as I'm concerned we're better off reverting this ASAP as it's currently horribly broken.

I'd rather look at the original issue and generate an EACCESS issue and then see if it's fixable by moving some of the code around in haraka.js or server.js as I seriously don't like the exit() module now, it's a horrible hack.


Reply to this email directly or view it on GitHub.

@smfreegard
Copy link
Collaborator Author

Ok - I've spent some more time on this today. I was able to reproduce #1296 on iojs 1.8.4 - I only get the full output some of the time:

[smf@dmxscan1 Haraka]$ node haraka.js 
loaded TLD files: 1=1233 2=5914 3=2315
loaded 7847 Public Suffixes
loglevel: LOGPROTOCOL
Starting up Haraka version 2.8.0-alpha.7
[INFO] [-] [core] Loading plugins
[INFO] [-] [core] Loading plugin: access
[DEBUG] [-] [core] no timeout in access.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin access timeout is: 30s
[INFO] [-] [access] skipping helo.checks.regexps
[DEBUG] [-] [core] registered hook connect to access.rdns_access priority 0
[DEBUG] [-] [core] registered hook mail to access.mail_from_access priority 0
[DEBUG] [-] [core] registered hook rcpt to access.rcpt_to_access priority 0
[DEBUG] [-] [core] registered hook connect to access.any priority 0
[DEBUG] [-] [core] registered hook helo to access.any priority 0
[DEBUG] [-] [core] registered hook ehlo to access.any priority 0
[DEBUG] [-] [core] registered hook mail to access.any priority 0
[DEBUG] [-] [core] registered hook rcpt to access.any priority 0
[DEBUG] [-] [core] registered hook data_post to access.data_any priority 0
[INFO] [-] [core] Loading plugin: dnsbl
[DEBUG] [-] [core] no timeout in dnsbl.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin dnsbl timeout is: 30s
[DEBUG] [-] [core] no timeout in dns_list_base.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin dns_list_base timeout is: 30s
[DEBUG] [-] [core] Returning boolean true for main.reject=true
[DEBUG] [-] [dnsbl] looking up: 1.0.0.127.zen.spamhaus.org.
[DEBUG] [-] [dnsbl] will re-test list zones every 30 minutes
[DEBUG] [-] [core] registered hook connect to dnsbl.connect_first priority 0
[INFO] [-] [core] Loading plugin: helo.checks
[DEBUG] [-] [core] no timeout in helo.checks.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin helo.checks timeout is: 30s
[DEBUG] [-] [core] Returning boolean true for check.match_re=true
[DEBUG] [-] [core] Returning boolean true for check.bare_ip=true
[DEBUG] [-] [core] Returning boolean true for check.dynamic=true
[DEBUG] [-] [core] Returning boolean true for check.big_company=true
[DEBUG] [-] [core] Returning boolean true for check.valid_hostname=true
[DEBUG] [-] [core] Returning boolean true for check.forward_dns=true
[DEBUG] [-] [core] Returning boolean true for check.rdns_match=true
[DEBUG] [-] [core] Returning boolean true for check.mismatch=true
[DEBUG] [-] [core] Returning boolean true for check.proto_mismatch=true
[DEBUG] [-] [core] Returning boolean true for reject.mismatch=true
[DEBUG] [-] [core] Returning boolean false for reject.proto_mismatch=false
[DEBUG] [-] [core] Returning boolean false for reject.rdns_match=false
[DEBUG] [-] [core] Returning boolean false for reject.dynamic=false
[DEBUG] [-] [core] Returning boolean false for reject.bare_ip=false
[DEBUG] [-] [core] Returning boolean false for reject.literal_mismatch=false
[DEBUG] [-] [core] Returning boolean false for reject.valid_hostname=false
[DEBUG] [-] [core] Returning boolean false for reject.forward_dns=false
[DEBUG] [-] [core] Returning boolean true for reject.big_company=true
[DEBUG] [-] [core] Returning boolean true for skip.private_ip=true
[DEBUG] [-] [core] Returning boolean true for skip.relaying=true
[DEBUG] [-] [core] registered hook helo to helo.checks.proto_mismatch_smtp priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.proto_mismatch_esmtp priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.init priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.init priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.match_re priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.match_re priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.bare_ip priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.bare_ip priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.dynamic priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.dynamic priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.big_company priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.big_company priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.literal_mismatch priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.literal_mismatch priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.valid_hostname priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.valid_hostname priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.rdns_match priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.rdns_match priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.forward_dns priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.forward_dns priority 0
[INFO] [-] [core] Loading plugin: mail_from.is_resolvable
[DEBUG] [-] [core] no timeout in mail_from.is_resolvable.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin mail_from.is_resolvable timeout is: 30s
[DEBUG] [-] [core] Returning boolean false for main.allow_mx_ip=0
[DEBUG] [-] [core] Returning boolean true for main.reject_no_mx=1
[DEBUG] [-] [core] registered hook mail to mail_from.is_resolvable.hook_mail priority 0
[INFO] [-] [core] Loading plugin: rcpt_to.in_host_list
[DEBUG] [-] [core] no timeout in rcpt_to.in_host_list.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin rcpt_to.in_host_list timeout is: 30s
[DEBUG] [-] [core] no timeout in rcpt_to.host_list_base.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin rcpt_to.host_list_base timeout is: 30s
[DEBUG] [-] [core] registered hook rcpt to rcpt_to.in_host_list.hook_rcpt priority 0
[DEBUG] [-] [core] registered hook mail to rcpt_to.in_host_list.hook_mail priority 0
[INFO] [-] [core] Loading plugin: data.headers
[DEBUG] [-] [core] no timeout in data.headers.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin data.headers timeout is: 30s
[DEBUG] [-] [core] registered hook data_post to data.headers.duplicate_singular priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.missing_required priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.invalid_date priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.invalid_return_path priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.user_agent priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.direct_to_mx priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.from_match priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.delivered_to priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.mailing_list priority 0[smf@dmxscan1 Haraka]$

The workaround is to pipe to cat (which highlights a bug in logger.dump_logs()):

[smf@dmxscan1 Haraka]$ node haraka.js  | cat
loaded TLD files: 1=1233 2=5914 3=2315
loaded 7847 Public Suffixes
[ERROR] [-] [core] Failed to setup listeners: listen EACCES ::0:25
[NOTICE] [-] [core] Shutting down

The "missing" output in the latter is because logger.dump_logs only dumps the logs when process.stdout.isTTY is true, which looks very much like a bug to me.

If we decide to revisit this, it seems one option might be: if (I am a cluster child) process.exit() else library_exit()

True, but in my case this all blew up in haraka.js, so we'd have to use environment variables to detect if we're parent or child. My concern would be what other bugs lay down that route, I couldn't reliably reproduce the conditions that caused my crashes which is why it took so long to find it.

I was able to reliably get all log output without any pipes of workarounds by doing this:

logger.dump_logs = function (cb) {
    while (logger.deferred_logs.length > 0) {
        var log_item = logger.deferred_logs.shift();
        var color = logger.colors[log_item.level];
        if (color && stdout_is_tty) {
            console.log(logger.colorize(color,log_item.data));
        }
    }
    // Run callback function once everything has been written
    if (cb) {
        process.stdout.write('', cb);
    }
    return true;
};

And then changing the calls to it like so:

            logger.logerror("Failed to setup listeners: " + err.message);
            logger.dump_logs(function () {
                process.exit(1);
            });
            return;

I admit I don't really understand why that works See nodejs/node#2972 (comment)

If that's an acceptable workaround, then I'd rather do it this way than to use exit().

@baudehlo
Copy link
Collaborator

+1

On Mar 11, 2016, at 7:22 AM, Steve Freegard [email protected] wrote:

Ok - I've spent some more time on this today. I was able to reproduce #1296 on iojs 1.8.4 - I only get the full output some of the time:

[smf@dmxscan1 Haraka]$ node haraka.js
loaded TLD files: 1=1233 2=5914 3=2315
loaded 7847 Public Suffixes
loglevel: LOGPROTOCOL
Starting up Haraka version 2.8.0-alpha.7
[INFO] [-] [core] Loading plugins
[INFO] [-] [core] Loading plugin: access
[DEBUG] [-] [core] no timeout in access.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin access timeout is: 30s
[INFO] [-] [access] skipping helo.checks.regexps
[DEBUG] [-] [core] registered hook connect to access.rdns_access priority 0
[DEBUG] [-] [core] registered hook mail to access.mail_from_access priority 0
[DEBUG] [-] [core] registered hook rcpt to access.rcpt_to_access priority 0
[DEBUG] [-] [core] registered hook connect to access.any priority 0
[DEBUG] [-] [core] registered hook helo to access.any priority 0
[DEBUG] [-] [core] registered hook ehlo to access.any priority 0
[DEBUG] [-] [core] registered hook mail to access.any priority 0
[DEBUG] [-] [core] registered hook rcpt to access.any priority 0
[DEBUG] [-] [core] registered hook data_post to access.data_any priority 0
[INFO] [-] [core] Loading plugin: dnsbl
[DEBUG] [-] [core] no timeout in dnsbl.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin dnsbl timeout is: 30s
[DEBUG] [-] [core] no timeout in dns_list_base.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin dns_list_base timeout is: 30s
[DEBUG] [-] [core] Returning boolean true for main.reject=true
[DEBUG] [-] [dnsbl] looking up: 1.0.0.127.zen.spamhaus.org.
[DEBUG] [-] [dnsbl] will re-test list zones every 30 minutes
[DEBUG] [-] [core] registered hook connect to dnsbl.connect_first priority 0
[INFO] [-] [core] Loading plugin: helo.checks
[DEBUG] [-] [core] no timeout in helo.checks.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin helo.checks timeout is: 30s
[DEBUG] [-] [core] Returning boolean true for check.match_re=true
[DEBUG] [-] [core] Returning boolean true for check.bare_ip=true
[DEBUG] [-] [core] Returning boolean true for check.dynamic=true
[DEBUG] [-] [core] Returning boolean true for check.big_company=true
[DEBUG] [-] [core] Returning boolean true for check.valid_hostname=true
[DEBUG] [-] [core] Returning boolean true for check.forward_dns=true
[DEBUG] [-] [core] Returning boolean true for check.rdns_match=true
[DEBUG] [-] [core] Returning boolean true for check.mismatch=true
[DEBUG] [-] [core] Returning boolean true for check.proto_mismatch=true
[DEBUG] [-] [core] Returning boolean true for reject.mismatch=true
[DEBUG] [-] [core] Returning boolean false for reject.proto_mismatch=false
[DEBUG] [-] [core] Returning boolean false for reject.rdns_match=false
[DEBUG] [-] [core] Returning boolean false for reject.dynamic=false
[DEBUG] [-] [core] Returning boolean false for reject.bare_ip=false
[DEBUG] [-] [core] Returning boolean false for reject.literal_mismatch=false
[DEBUG] [-] [core] Returning boolean false for reject.valid_hostname=false
[DEBUG] [-] [core] Returning boolean false for reject.forward_dns=false
[DEBUG] [-] [core] Returning boolean true for reject.big_company=true
[DEBUG] [-] [core] Returning boolean true for skip.private_ip=true
[DEBUG] [-] [core] Returning boolean true for skip.relaying=true
[DEBUG] [-] [core] registered hook helo to helo.checks.proto_mismatch_smtp priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.proto_mismatch_esmtp priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.init priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.init priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.match_re priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.match_re priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.bare_ip priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.bare_ip priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.dynamic priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.dynamic priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.big_company priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.big_company priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.literal_mismatch priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.literal_mismatch priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.valid_hostname priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.valid_hostname priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.rdns_match priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.rdns_match priority 0
[DEBUG] [-] [core] registered hook helo to helo.checks.forward_dns priority 0
[DEBUG] [-] [core] registered hook ehlo to helo.checks.forward_dns priority 0
[INFO] [-] [core] Loading plugin: mail_from.is_resolvable
[DEBUG] [-] [core] no timeout in mail_from.is_resolvable.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin mail_from.is_resolvable timeout is: 30s
[DEBUG] [-] [core] Returning boolean false for main.allow_mx_ip=0
[DEBUG] [-] [core] Returning boolean true for main.reject_no_mx=1
[DEBUG] [-] [core] registered hook mail to mail_from.is_resolvable.hook_mail priority 0
[INFO] [-] [core] Loading plugin: rcpt_to.in_host_list
[DEBUG] [-] [core] no timeout in rcpt_to.in_host_list.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin rcpt_to.in_host_list timeout is: 30s
[DEBUG] [-] [core] no timeout in rcpt_to.host_list_base.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin rcpt_to.host_list_base timeout is: 30s
[DEBUG] [-] [core] registered hook rcpt to rcpt_to.in_host_list.hook_rcpt priority 0
[DEBUG] [-] [core] registered hook mail to rcpt_to.in_host_list.hook_mail priority 0
[INFO] [-] [core] Loading plugin: data.headers
[DEBUG] [-] [core] no timeout in data.headers.timeout
[DEBUG] [-] [core] no timeout in plugin_timeout
[DEBUG] [-] [core] plugin data.headers timeout is: 30s
[DEBUG] [-] [core] registered hook data_post to data.headers.duplicate_singular priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.missing_required priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.invalid_date priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.invalid_return_path priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.user_agent priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.direct_to_mx priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.from_match priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.delivered_to priority 0
[DEBUG] [-] [core] registered hook data_post to data.headers.mailing_list priority 0[smf@dmxscan1 Haraka]$
The workaround is to pipe to cat (which highlights a bug in logger.dump_logs()):

[smf@dmxscan1 Haraka]$ node haraka.js | cat
loaded TLD files: 1=1233 2=5914 3=2315
loaded 7847 Public Suffixes
[ERROR] [-] [core] Failed to setup listeners: listen EACCES ::0:25
[NOTICE] [-] [core] Shutting down
The "missing" output in the latter is because logger.dump_logs only dumps the logs when process.stdout.isTTY is true, which looks very much like a bug to me.

If we decide to revisit this, it seems one option might be: if (I am a cluster child) process.exit() else library_exit()

True, but in my case this all blew up in haraka.js, so we'd have to use environment variables to detect if we're parent or child. My concern would be what other bugs lay down that route, I couldn't reliably reproduce the conditions that caused my crashes which is why it took so long to find it.

I was able to reliably get all log output without any pipes of workarounds by doing this:

logger.dump_logs = function (cb) {
while (logger.deferred_logs.length > 0) {
var log_item = logger.deferred_logs.shift();
var color = logger.colors[log_item.level];
if (color && stdout_is_tty) {
console.log(logger.colorize(color,log_item.data));
}
}
// Run callback function once everything has been written
if (cb) {
process.stdout.write('', cb);
}
return true;
};
And then changing the calls to it like so:

        logger.logerror("Failed to setup listeners: " + err.message);
        logger.dump_logs(function () {
            process.exit(1);
        });
        return;

I admit I don't really understand why that works See nodejs/node#2972 (comment)

If that's an acceptable workaround, then I'd rather do it this way than to use exit().


Reply to this email directly or view it on GitHub.

@smfreegard
Copy link
Collaborator Author

Ok - I've refactored the necessary changes and created a new logger.dump_and_exit(code) method which is used instead.

This fixes the issue reported in #1296 and reverts the changes it made.

@msimerson msimerson merged commit 42c9bd9 into haraka:master Mar 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants